37 research outputs found

    Bayesian Additive Regression Trees With Parametric Models of Heteroskedasticity

    Full text link
    We incorporate heteroskedasticity into Bayesian Additive Regression Trees (BART) by modeling the log of the error variance parameter as a linear function of prespecified covariates. Under this scheme, the Gibbs sampling procedure for the original sum-of- trees model is easily modified, and the parameters for the variance model are updated via a Metropolis-Hastings step. We demonstrate the promise of our approach by providing more appropriate posterior predictive intervals than homoskedastic BART in heteroskedastic settings and demonstrating the model's resistance to overfitting. Our implementation will be offered in an upcoming release of the R package bartMachine.Comment: 20 pages, 5 figure

    Matching on-the-fly in Sequential Experiments for Higher Power and Efficiency

    Full text link
    We propose a dynamic allocation procedure that increases power and efficiency when measuring an average treatment effect in sequential randomized trials. Subjects arrive iteratively and are either randomized or paired via a matching criterion to a previously randomized subject and administered the alternate treatment. We develop estimators for the average treatment effect that combine information from both the matched pairs and unmatched subjects as well as an exact test. Simulations illustrate the method's higher efficiency and power over competing allocation procedures in both controlled scenarios and historical experimental data.Comment: 20 pages, 1 algorithm, 2 figures, 8 table

    Statistical Analysis and Design of Crowdsourcing Applications

    Get PDF
    This thesis develops methods for the analysis and design of crowdsourced experiments and crowdsourced labeling tasks. Much of this document focuses on applications including running natural field experiments, estimating the number of objects in images and collecting labels for word sense disambiguation. Observed shortcomings of the crowdsourced experiments inspired the development of methodology for running more powerful experiments via matching on-the-fly. Using the label data to estimate response functions inspired work on non-parametric function estimation using Bayesian Additive Regression Trees (BART). This work then inspired extensions to BART such as incorporation of missing data as well as a user-friendly R package

    bartMachine: Machine Learning with Bayesian Additive Regression Trees

    Get PDF
    We present a new package in R implementing Bayesian additive regression trees (BART). The package introduces many new features for data analysis using BART such as variable selection, interaction detection, model diagnostic plots, incorporation of missing data and the ability to save trees for future prediction. It is significantly faster than the current R implementation, parallelized, and capable of handling both large sample sizes and high-dimensional data

    Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation

    Full text link
    This article presents Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. Classical partial dependence plots (PDPs) help visualize the average partial relationship between the predicted response and one or more features. In the presence of substantial interaction effects, the partial response relationship can be heterogeneous. Thus, an average curve, such as the PDP, can obfuscate the complexity of the modeled relationship. Accordingly, ICE plots refine the partial dependence plot by graphing the functional relationship between the predicted response and the feature for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate, suggesting where and to what extent heterogeneities might exist. In addition to providing a plotting suite for exploratory analysis, we include a visual test for additive structure in the data generating model. Through simulated examples and real data sets, we demonstrate how ICE plots can shed light on estimated models in ways PDPs cannot. Procedures outlined are available in the R package ICEbox.Comment: 22 pages, 14 figures, 2 algorithm
    corecore